Analysis of Open-Source Embedding Models Using MRR¶
Mean Reciprocal Rank (MRR) is a statistical measure used to evaluate the performance of information retrieval systems. It is commonly used when the system returns a ranked list of results for a set of queries.
For each query, the reciprocal rank is the inverse of the rank at which the first relevant document appears. The mean of these reciprocal ranks across all queries gives the Mean Reciprocal Rank.
Formula¶
Mean Reciprocal Rank (MRR) is calculated using the following formula:
MRR = (1 / Q) * ∑ (1 / rankᵢ), for i = 1 to Q
Where:
- Q = total number of queries
- rankᵢ = rank position of the first relevant document result for the i-th query
- ∑ = summation
- If a relevant document is not found in the top-k results, the reciprocal rank is treated as 0 for that query.
Example¶
Suppose we have 3 queries and their first relevant result appears at the following ranks:
- Query 1: Rank 1 → Reciprocal Rank = 1/1 = 1.0
- Query 2: Rank 3 → Reciprocal Rank = 1/3 = 0.333
- Query 3: Not Found → Reciprocal Rank = 0.0
MRR = (1.0 + 0.333 + 0.0)/3 = 0.444
Why Use MRR?¶
MRR focuses on the rank of the first correct result, making it especially useful for applications like:
- Question answering
- Search engines
- Recommendation systems
Higher MRR means relevant results are appearing sooner in the ranked list, which implies a better user experience.
%md
🧪 Open-Source LLM Models Used in This Experiment Are:¶
Model | Model Description | Key Embedding Strengths | Suited for Retrieval and/or Search |
---|---|---|---|
bert-base-uncased |
Original BERT model by Google (12-layer, uncased). Trained on BooksCorpus and Wikipedia. | Strong contextual token representations; general-purpose; needs pooling for sentence-level use. | ✔️❌ |
roberta-base |
Improved BERT variant by Facebook; trained on more data without NSP objective. | Better performance than BERT in many tasks; robust embeddings; also needs pooling for use. | ✔️❌ |
sentence-transformers/all-MiniLM-L6-v2 |
Lightweight Sentence-BERT model fine-tuned for semantic similarity tasks. | Highly efficient sentence embeddings; optimized for semantic search and retrieval. | ✔️✔️ |
colbert-ir/colbertv2.0 |
Late-interaction retrieval model designed for scalable, fine-grained document retrieval. | Per-token embeddings; strong for passage-level retrieval; excels in re-ranking pipelines. | ✔️✔️ |
xlnet/xlnet-base-cased |
Permutation-based transformer model; aims to capture bidirectional context without masking. | Powerful language modeling; less commonly used for embeddings; lacks fine-tuned pooling. | ✔️❌ |
What is “Pooling” and Why Is It Needed for Sentence-Level Use?¶
LLM models like BERT and RoBERTa generate token-level embeddings — meaning for each word or subword token in the input text, the model outputs a separate embedding vector.
However, for many downstream tasks such as:
- Semantic search
- Information retrieval
- Sentence similarity
- Clustering or classification
...you need a single fixed-size vector to represent the entire sentence or document.
That’s where pooling comes in.
Pooling is the process of combining the multiple token embeddings into one sentence-level embedding.
Common Pooling Methods:¶
Pooling Method | Description |
---|---|
[CLS] Token | Use the embedding of the [CLS] token (first token) as a summary representation. Often used with BERT. |
Mean Pooling | Average all token embeddings (optionally ignoring padding tokens). Provides a balanced sentence representation. |
Max Pooling | Take the max value across all tokens for each dimension. Highlights the strongest signals. |
Attention Pooling | Learn a weighted average using attention scores — more advanced, often requires fine-tuning. |
“Pooling is needed for sentence-level use” required manually converting the token-level output of BERT/RoBERTa into a single vector that represents the entire sentence or passage — typically using mean, max, or CLS pooling — in order to use them effectively for retrieval, search, or semantic comparison tasks.
📝 Note: Pooling has not been applied in this example; the output remains at the token level rather than being aggregated into a sentence-level embedding.
%pip install -q transformers torch matplotlib seaborn numpy scikit-learn
Python interpreter will be restarted. Python interpreter will be restarted.
%pip install -q prettytable
Python interpreter will be restarted. Python interpreter will be restarted.
##version 0.3
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from transformers import AutoTokenizer, AutoModel
from prettytable import PrettyTable
import textwrap
import torch
# List of models to evaluate
models = [
"bert-base-uncased",
"roberta-base",
"sentence-transformers/all-MiniLM-L6-v2",
"colbert-ir/colbertv2.0",
"xlnet/xlnet-base-cased"
]
# Function to load models and tokenizers
def load_model(model_name):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
return tokenizer, model
# Function to compute embeddings
def get_embeddings(sentences, tokenizer, model):
inputs = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
return embeddings
# Function to compute MRR and collect detailed info
def compute_detailed_mrr(retrieved_indices, relevant_indices, queries, relevant_sentences):
mrr = 0.0
details = []
for query_idx, retrieved in enumerate(retrieved_indices):
rank = None
for r, index in enumerate(retrieved):
if index in relevant_indices[query_idx]:
rank = r + 1 # 1-based rank
mrr += 1 / rank
break
rank_display = str(rank) if rank else "Not Found"
query = queries[query_idx]
correct = [relevant_sentences[i] for i in relevant_indices[query_idx]]
retrieved_texts = [relevant_sentences[i] for i in retrieved]
details.append({
"query": query,
"correct": correct,
"retrieved": retrieved_texts,
"rank": rank_display
})
return mrr / len(queries), details
# Display table function with content wrapping
def display_detailed_table(results, table_width=40, wrap_width=30):
table = PrettyTable()
table.field_names = ["Model", "Query", "Correct Results", "Top-K Retrieved", "Rank", "Mean Reciprocal Rank"]
table.max_width = table_width
for model_name, model_data in results.items():
mrr = model_data["mrr"]
for row in model_data["details"]:
query_wrapped = textwrap.fill(row["query"], width=wrap_width)
correct_wrapped = "\n".join([textwrap.fill(s, width=wrap_width) for s in row["correct"]])
retrieved_wrapped = "\n".join([textwrap.fill(s, width=wrap_width) for s in row["retrieved"]])
table.add_row([
textwrap.fill(model_name, width=wrap_width),
query_wrapped,
correct_wrapped,
retrieved_wrapped,
row["rank"],
f"{mrr:.3f}"
])
print("\nDetailed Retrieval Results:")
print(table)
# Main evaluation function
def evaluate_models(queries, relevant_indices, k=2): #k=2 retrieve top 2
results = {}
for model_name in models:
print(f"Evaluating model: {model_name}")
tokenizer, model = load_model(model_name)
query_embeddings = get_embeddings(queries, tokenizer, model)
dataset_embeddings = get_embeddings(relevant_sentences, tokenizer, model)
similarities = cosine_similarity(query_embeddings, dataset_embeddings)
retrieved_indices = np.argsort(-similarities, axis=1)[:, :k]
mrr, details = compute_detailed_mrr(retrieved_indices, relevant_indices, queries, relevant_sentences)
results[model_name] = {"mrr": mrr, "details": details}
return results
# Sample queries and relevant sentences
queries = [
"What are the health benefits of eating apples?",
"How does climate change affect wildlife?",
"What are some effective study techniques?",
"Can you explain the theory of relativity?",
"What are the advantages of renewable energy?"
]
relevant_sentences = [
"Eating apples can lower cholesterol levels.", # Relevant sentence indices for query 1 (index : 0)
"Climate change disrupts animal migration patterns.", # Relevant sentence indices for query 2 (index : 1)
"Spaced repetition is a powerful study method.", # Relevant sentence indices for query 3 (index : 2)
"The theory of relativity explains how time and space are interconnected.", # Relevant sentence indices for query 4 (index : 3)
"Renewable energy reduces greenhouse gas emissions.", # Relevant sentence indices for query 5 (index : 4)
"Apples are rich in vitamins and dietary fiber.", # Relevant sentence indices for query 1 (index : 5)
"Wildlife is increasingly affected by habitat loss due to climate change.", # Relevant sentence indices for query 2 (index : 6)
"Active recall helps improve memory retention.", # Relevant sentence indices for query 3 (index : 7)
"Einstein's theory revolutionized physics.", # Relevant sentence indices for query 4 (index : 8)
"Solar panels provide a sustainable energy source." # Relevant sentence indices for query 5 (index : 9)
]
# Assuming relevant indices for MRR calculation
relevant_indices = [
[0, 5], # Relevant sentence indices for query 1 [Apples]
[1, 6], # Relevant sentence indices for query 2 [Climate change]
[2, 7], # Relevant sentence indices for query 3 [Study techniques]
[3, 8], # Relevant sentence indices for query 4 [Relativity]
[4, 9] # Relevant sentence indices for query 5 [Renewable energy]
]
# Evaluate models
results = evaluate_models(queries, relevant_indices)
# Plot MRR results
plt.figure(figsize=(12, 6))
bar_plot = sns.barplot(x=list(results.keys()), y=[v["mrr"] for v in results.values()], palette="viridis")
plt.title('MRR of Open-Source Embedding Models', fontsize=16)
plt.xlabel('Model', fontsize=14)
plt.ylabel('Mean Reciprocal Rank (MRR)', fontsize=14)
plt.xticks(rotation=45)
# Annotate bars
for p in bar_plot.patches:
bar_plot.annotate(f'{p.get_height():.2f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.show()
# Print PrettyTable output
display_detailed_table(results, table_width=40, wrap_width=30)
Evaluating model: bert-base-uncased Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Evaluating model: roberta-base Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.bias'] - This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Evaluating model: sentence-transformers/all-MiniLM-L6-v2 Evaluating model: colbert-ir/colbertv2.0 Some weights of the model checkpoint at colbert-ir/colbertv2.0 were not used when initializing BertModel: ['linear.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Evaluating model: xlnet/xlnet-base-cased Some weights of the model checkpoint at xlnet/xlnet-base-cased were not used when initializing XLNetModel: ['lm_loss.weight', 'lm_loss.bias'] - This IS expected if you are initializing XLNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing XLNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Detailed Retrieval Results: +----------------------------+--------------------------------+--------------------------------+--------------------------------+-----------+----------------------+ | Model | Query | Correct Results | Top-K Retrieved | Rank | Mean Reciprocal Rank | +----------------------------+--------------------------------+--------------------------------+--------------------------------+-----------+----------------------+ | bert-base-uncased | What are the health benefits | Eating apples can lower | Eating apples can lower | 1 | 1.000 | | | of eating apples? | cholesterol levels. | cholesterol levels. | | | | | | Apples are rich in vitamins | Apples are rich in vitamins | | | | | | and dietary fiber. | and dietary fiber. | | | | bert-base-uncased | How does climate change affect | Climate change disrupts animal | Climate change disrupts animal | 1 | 1.000 | | | wildlife? | migration patterns. | migration patterns. | | | | | | Wildlife is increasingly | Wildlife is increasingly | | | | | | affected by habitat loss due | affected by habitat loss due | | | | | | to climate change. | to climate change. | | | | bert-base-uncased | What are some effective study | Spaced repetition is a | Spaced repetition is a | 1 | 1.000 | | | techniques? | powerful study method. | powerful study method. | | | | | | Active recall helps improve | Solar panels provide a | | | | | | memory retention. | sustainable energy source. | | | | bert-base-uncased | Can you explain the theory of | The theory of relativity | The theory of relativity | 1 | 1.000 | | | relativity? | explains how time and space | explains how time and space | | | | | | are interconnected. | are interconnected. | | | | | | Einstein's theory | Einstein's theory | | | | | | revolutionized physics. | revolutionized physics. | | | | bert-base-uncased | What are the advantages of | Renewable energy reduces | Solar panels provide a | 1 | 1.000 | | | renewable energy? | greenhouse gas emissions. | sustainable energy source. | | | | | | Solar panels provide a | Renewable energy reduces | | | | | | sustainable energy source. | greenhouse gas emissions. | | | | roberta-base | What are the health benefits | Eating apples can lower | Apples are rich in vitamins | 1 | 1.000 | | | of eating apples? | cholesterol levels. | and dietary fiber. | | | | | | Apples are rich in vitamins | Eating apples can lower | | | | | | and dietary fiber. | cholesterol levels. | | | | roberta-base | How does climate change affect | Climate change disrupts animal | Climate change disrupts animal | 1 | 1.000 | | | wildlife? | migration patterns. | migration patterns. | | | | | | Wildlife is increasingly | Wildlife is increasingly | | | | | | affected by habitat loss due | affected by habitat loss due | | | | | | to climate change. | to climate change. | | | | roberta-base | What are some effective study | Spaced repetition is a | Spaced repetition is a | 1 | 1.000 | | | techniques? | powerful study method. | powerful study method. | | | | | | Active recall helps improve | Climate change disrupts animal | | | | | | memory retention. | migration patterns. | | | | roberta-base | Can you explain the theory of | The theory of relativity | The theory of relativity | 1 | 1.000 | | | relativity? | explains how time and space | explains how time and space | | | | | | are interconnected. | are interconnected. | | | | | | Einstein's theory | Einstein's theory | | | | | | revolutionized physics. | revolutionized physics. | | | | roberta-base | What are the advantages of | Renewable energy reduces | Solar panels provide a | 1 | 1.000 | | | renewable energy? | greenhouse gas emissions. | sustainable energy source. | | | | | | Solar panels provide a | Spaced repetition is a | | | | | | sustainable energy source. | powerful study method. | | | | sentence-transformers/all- | What are the health benefits | Eating apples can lower | Apples are rich in vitamins | 1 | 1.000 | | MiniLM-L6-v2 | of eating apples? | cholesterol levels. | and dietary fiber. | | | | | | Apples are rich in vitamins | Eating apples can lower | | | | | | and dietary fiber. | cholesterol levels. | | | | sentence-transformers/all- | How does climate change affect | Climate change disrupts animal | Wildlife is increasingly | 1 | 1.000 | | MiniLM-L6-v2 | wildlife? | migration patterns. | affected by habitat loss due | | | | | | Wildlife is increasingly | to climate change. | | | | | | affected by habitat loss due | Climate change disrupts animal | | | | | | to climate change. | migration patterns. | | | | sentence-transformers/all- | What are some effective study | Spaced repetition is a | Spaced repetition is a | 1 | 1.000 | | MiniLM-L6-v2 | techniques? | powerful study method. | powerful study method. | | | | | | Active recall helps improve | Active recall helps improve | | | | | | memory retention. | memory retention. | | | | sentence-transformers/all- | Can you explain the theory of | The theory of relativity | The theory of relativity | 1 | 1.000 | | MiniLM-L6-v2 | relativity? | explains how time and space | explains how time and space | | | | | | are interconnected. | are interconnected. | | | | | | Einstein's theory | Einstein's theory | | | | | | revolutionized physics. | revolutionized physics. | | | | sentence-transformers/all- | What are the advantages of | Renewable energy reduces | Renewable energy reduces | 1 | 1.000 | | MiniLM-L6-v2 | renewable energy? | greenhouse gas emissions. | greenhouse gas emissions. | | | | | | Solar panels provide a | Solar panels provide a | | | | | | sustainable energy source. | sustainable energy source. | | | | colbert-ir/colbertv2.0 | What are the health benefits | Eating apples can lower | Eating apples can lower | 1 | 1.000 | | | of eating apples? | cholesterol levels. | cholesterol levels. | | | | | | Apples are rich in vitamins | Apples are rich in vitamins | | | | | | and dietary fiber. | and dietary fiber. | | | | colbert-ir/colbertv2.0 | How does climate change affect | Climate change disrupts animal | Wildlife is increasingly | 1 | 1.000 | | | wildlife? | migration patterns. | affected by habitat loss due | | | | | | Wildlife is increasingly | to climate change. | | | | | | affected by habitat loss due | Climate change disrupts animal | | | | | | to climate change. | migration patterns. | | | | colbert-ir/colbertv2.0 | What are some effective study | Spaced repetition is a | Spaced repetition is a | 1 | 1.000 | | | techniques? | powerful study method. | powerful study method. | | | | | | Active recall helps improve | Einstein's theory | | | | | | memory retention. | revolutionized physics. | | | | colbert-ir/colbertv2.0 | Can you explain the theory of | The theory of relativity | The theory of relativity | 1 | 1.000 | | | relativity? | explains how time and space | explains how time and space | | | | | | are interconnected. | are interconnected. | | | | | | Einstein's theory | Einstein's theory | | | | | | revolutionized physics. | revolutionized physics. | | | | colbert-ir/colbertv2.0 | What are the advantages of | Renewable energy reduces | Renewable energy reduces | 1 | 1.000 | | | renewable energy? | greenhouse gas emissions. | greenhouse gas emissions. | | | | | | Solar panels provide a | Solar panels provide a | | | | | | sustainable energy source. | sustainable energy source. | | | | xlnet/xlnet-base-cased | What are the health benefits | Eating apples can lower | The theory of relativity | Not Found | 0.400 | | | of eating apples? | cholesterol levels. | explains how time and space | | | | | | Apples are rich in vitamins | are interconnected. | | | | | | and dietary fiber. | Einstein's theory | | | | | | | revolutionized physics. | | | | xlnet/xlnet-base-cased | How does climate change affect | Climate change disrupts animal | Einstein's theory | 2 | 0.400 | | | wildlife? | migration patterns. | revolutionized physics. | | | | | | Wildlife is increasingly | Wildlife is increasingly | | | | | | affected by habitat loss due | affected by habitat loss due | | | | | | to climate change. | to climate change. | | | | xlnet/xlnet-base-cased | What are some effective study | Spaced repetition is a | Einstein's theory | 2 | 0.400 | | | techniques? | powerful study method. | revolutionized physics. | | | | | | Active recall helps improve | Spaced repetition is a | | | | | | memory retention. | powerful study method. | | | | xlnet/xlnet-base-cased | Can you explain the theory of | The theory of relativity | The theory of relativity | 1 | 0.400 | | | relativity? | explains how time and space | explains how time and space | | | | | | are interconnected. | are interconnected. | | | | | | Einstein's theory | Einstein's theory | | | | | | revolutionized physics. | revolutionized physics. | | | | xlnet/xlnet-base-cased | What are the advantages of | Renewable energy reduces | The theory of relativity | Not Found | 0.400 | | | renewable energy? | greenhouse gas emissions. | explains how time and space | | | | | | Solar panels provide a | are interconnected. | | | | | | sustainable energy source. | Einstein's theory | | | | | | | revolutionized physics. | | | +----------------------------+--------------------------------+--------------------------------+--------------------------------+-----------+----------------------+